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Abstract 



We propose a theoretical formalism, molecular finite automata (MFA), 
to describe individual proteins as rule-based computing machines. The 
MFA formalism provides a framework for modeling individual protein be- 
haviors and systems-level dynamics via construction of programmable and 
executable machines. Models specified within this formalism explicitly 
represent the context-sensitive dynamics of individual proteins driven by 
external inputs and represent protein-protein interactions as synchronized 
Q ' machine reconfigurations. Both deterministic and stochastic simulations 

• 1— ( , can be applied to quantitatively compute the dynamics of MFA models. 

'-*-' ■ We apply the MFA formalism to model and simulate a simple example of 

^H' a signal transduction system that involves a MAP kinase cascade and a 

scaffold protein. 
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(^ ■ 1 Introduction 

O. 

^^ . In computational systems biology, studying a complex biochemical system in- 

volving a large number of interacting proteins often relies on in silico simulations 
to analyze and predict system behaviors [T]. In recent years, computational 
models have been increasingly used in cell signaling research and have been 
developed to study various pathways 13]. However, models often fail to cap- 
C^ ' ture the mechanistic details of signal transduction systems [4]. For example, 

models sometimes inadequately account for the complexities of protein interac- 
tions, including interaction details at the level of protein sites and structural 
relationships among components of signaling proteins (S^ , particularly multisite 
protein modification in the context of multiprotein complcxation [S] . Proteins 
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in a signal-transduction system often have multiple component parts that en- 
able the protein to interact with other molecules in a modular manner [71 |S1 [S] ■ 
Models that account for the functions of the component parts of proteins (e.g, 
linear motifs and protein interaction domains) are needed to better understand 
the dynamics of signal-transduction systems [THl HI] • 

Limitations of conventional modeling approaches, which rely on explicit spec- 
ifications of chemical reaction networks, lie in both model construction and 
simulation. Conventional models are essentially specified with lists of biochem- 
ical species and their reactions. However, representing a biochemical system 
as a chemical reaction network is often cumbersome and unnecessary [12l [13] . 
Graphical rule-based modeling formalisms and associated simulation algorithms 
have been developed to represent biochemical systems in terms of formal rules 
for biomolecular interactions [ni[Tl[Tl[ini[Tl[I71[Tl[Tl]. In graphical rule- 
based modeling, graphs (or the equivalents) are used to represent molecules, 
and graph-rewriting rules (or the equivalents) are used to represent molecular 
interactions. A rule represents a molecular interaction explicitly and the reac- 
tions that can arise from the interaction implicitly, and a rule can be viewed as 
a coarse-grained description of a class of reactions. 

Two common types of protein interactions, multivalent protein binding and 
multisite post-translational protein modification, cause a combinatorial increase 
in the size of a reaction network with an increase in the number of interaction 
modules. It is usually difficult and error prone to manually construct a full-sized 
chemical reaction model. As an alternative, such conventional models can be au- 
tomatically obtained using rule-based reaction generation tools, such as BioNet- 
Gen, Virtual Cell or little b [23 (HI 121 123] , Molccuhzer or Smoldyn [H US], 
Simmune [26], or Stochastic Simulator Compiler (SSC) [22 . Unfortunately, the 
number of reactions and biochemical species implied by rules can be enormously 
large (even infinite or limited only by the number of molecules in a system) for 
rule-based models of signal transduction systems [12] . making it inefficient to 
construct, simulate and analyze conventional models derived from rules. 

In addition to formalisms based on graph rewriting, a number of theo- 
retical frameworks for biomolecular interaction systems have been proposed 
over the past decade or so to facilitate model building and simulation. De- 
spite the differences in their syntactical and grammatical structures, most for- 
malisms share a common feature: molecular entities are treated as computa- 
tional agents that interact with one another according to a collection of specific 
protocols [28j[29l|30l|3Tl[32]. For example, protein interactions have been viewed 
as concurrent processes and have been modeled with communication protocols 
by process algebras, such as 7r-calculus [IHIISD]. Many of these formalisms have 
been coupled to Gillespie's stochastic simulation algorithm [301 133] to enable 
discrete-event simulation. 

In engineering and computer science, complex dynamical systems with het- 
erogeneous, modular and reactive components are frequently modeled by state 
machines and related formal structures. In this paper, we propose a new for- 
malism, referred to as molecular finite automata (MFA), to model individual 
proteins as structured computing agents and to specify protein-protein interac- 
tions in the form of synchronized dynamics of interacting agents. The main goal 
is to provide an intuitive as well as programmable representation for biomolecu- 
lar interaction systems. The MFA formalism is developed by incorporating and 
extending the classic structure of finite automata, which is a well-established 
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Figure 1: An example finite automaton, (a) State transition diagram for a 
three-state finite automaton. A circle denotes a state. An arrow denotes a 
state transition. A letter next to an arrow denotes an input, (b) A state 
transition table for the finite automaton in panel (a). The leftmost column 
indicates possible current states. The topmost row indicates inputs. A table 
entry indicates a target state given a current state and an input. The symbol 
'-' indicates "not applicable." 

formalism that has a wide application range and for which many sophisticated 
software and hardware tools are available. As we will see, an agent within the 
MFA framework explicitly represents a protein's activity (or state) induced by 
external inputs, and a protein interaction is specified as a joint transformation of 
the states of multiple MFA agents. At the systems level, a collection of reaction 
rules is used to describe interactions among the MFA agents in a system. 

We also report simulation methods that can compute the dynamics of a 
system modeled within the MFA framework. Using the example of a MAP 
kinase cascade, we demonstrate how to apply the MFA formalism to model and 
simulate a cell signaling system. 

2 Formal model 

In the first part of this section, we introduce a representational framework us- 
ing MFAs to describe the building blocks, particularly proteins, of biomolecular 
interaction systems. In the second part, we show how to apply the MFA repre- 
sentations to construct quantitative models for cell signaling systems and how 
to compute with these models. 



2.1 Molecular entities — molecular finite automata 

The notion of finite automata, or finite state machines, is well-established in 
theoretical computer science and has been applied to model the dynamics of 
diverse discrete systems (i.e., systems with finite numbers of states) [34]. Fi- 
nite automata were traditionally developed to construct parsers and compilers, 
conduct formal verifications and mathematical proofs, and design and test soft- 
ware [311 133 [35]. Because of their simple, adaptive and intuitive structure, finite 
automata have been applied in a number of areas, including engineering systems 
design [Sj, computational linguistics ^ and communication protocols [39] . 
Interacting state machines have also been used in the field of computational 
biology to visualize and model cellular level interactions [3?. 

Our goal in this paper is to propose a formalism based on an extended struc- 
ture of finite automata that is suited for modeling biomolecular intreactions at 



the submolecular level with consideration of site-specific details. Below, we give 
a formal definition of a purely reactive finite automaton that will be extended 
for the later description of biomolecules. 

Definition 1 (finite automaton). A finite automaton is a tuple D = {S, X, S, sq), 
where S and X are finite sets of states and inputs, respectively. The function 
5 is a transition function that maps the current state along with an input in X 
into a target state, (5 : S* x X — > 5. The symbol sq denotes a start state. 

The automaton of Definition 1 is a so-called "deterministic" finite automaton 
(DFA). In a DFA, given an input, a state transition is non-ambiguous, and 
a DFA can only reside in one state at any given time. In contrast, a finite 
automaton can be nondeterministic, in which multiple state transition paths 
(or more than one target state) may exist for a single input. Here, assuming 
responses of a protein to external signals are deterministic, we focus on DFAs 
as the fundamental structures for modeling proteins. 

The above definition characterizes a finite automaton as a reactive model in 
which state transitions are induced by external events in the form of input sig- 
nals. Elements in the set of states S are represented using subscripted lower-case 
letters, S = {si, S2, ..., }. Throughout, inputs are represented using the lower- 
case alphabet, e and subscripted e's, i.e., X = {a, 6, ...,e,ei,£2, ■•■}■ A special 
input symbol e is introduced to model a non-specific external signal or a signal 
from an unknown source that causes a spontaneous state transition. In a model 
of a signaling system, such a non-specific input can be used to model molec- 
ular events such as dissociation of two bound proteins caused by background 
collisions with solvent molecules or protein modifications catalyzed by unknown 
enzymes. A finite automaton can be visually represented by a state transition 
diagram (Fig. [Ua)), a directed graph in which a node denotes a state and an 
edge denotes an input-induced transition. Equivalently, a finite automaton can 
be specified by a machine- readable state transition table (Fig. [IJb)). 

The dynamics of many reactive systems can be represented using the DFA 
structure of Definition 1. However, this structure is inefficient for describing 
signaling proteins. To extend the DFA structure to model a protein, we first 
look at the correspondence between properties of finite automata and protein 
functions. A classic finite automaton models a memoryless process, wherein a 
state transition depends only on the current state and an input. In contrast, 
protein interactions mostly happen under certain molecular contexts. To see the 
importance of molecular context, we consider allosteric regulation and protein 
complexation. Allosteric regulation of a protein or enzyme is a common mecha- 
nism in biochemistry. Protein activity in one domain is changed (either activated 
or inhibited) by binding or unbinding of an effector molecule at another site. 
The formation of heterogeneous and transient multiprotein complexes is one 
of the essential functions of protein-protein interactions in signal transduction. 
Context-sensitive interactions such as co-localization of an enzyme and one of 
its substrates control both the strength and specificity of molecular signaling. 
These features of protein interactions require an extension of the DFA structure 
beyond the representation of information only in terms of a finite number of 
states. 

To capture the contextual sensitivity of protein interactions, internal vari- 
ables are introduced to record contextual information such as information about 
binding partners or other local molecular information. An extension of Defini- 
tion I should also include functions that will read and modify the machine 



Table 1: Operators and symbols used in MFA structures 

Operator/symbol Definition 

X :— a Assignment of a value a to a variable x 

X — a Comparison between x and a 

I a Delimiter that precedes an operation a 

\a Delimiter that precedes a predicate a 

a.b Component operator: 6 is a member of a 

A-B Bond association between A and B 

X -^ A Mapping input x to machine A 



variables. Along with state transitions, these machine operations update the 
configuration of a protein. Based on these considerations, we define an en- 
hanced automaton structure, an "extended finite automaton" (EFA) to amend 
the classic finite automata structure. 

Definition 2 (extended finite automaton). An extended finite automaton 
is a tuple E — {S, X,S, so,v), where S and X are finite sets of states and 
inputs, respectively. The transition function, S : S x X\P(y) — > S/A{v), maps 
the current state along with an input in X into a target state upon evaluation 
of a predicate function P(v), and performs an operation A{-v) on the variable 
structure v along with the state transition. The symbol Sq denotes a start state. 

The meanings of operators (e.g., \ and /) used in the above definition are 
given in Table [TJ Our definition of EFA is close to the convention of an extended 
finite state machine |36j . which also involves operations on internal variables. 
Suppose that an EFA E is in state s. Upon receiving an input x, E undergoes 
a transition 6 = {s,q,x,P(v),A{v)), where s and q are the source and target 
states, respectively. If the predicate P(v) is true (e.g., an evaluation of variables 
in V indicates that the transition is legitimate), E moves to the target state q 
and performs an operation ^(v) on the variable structure v. 

Ultimately, another important and ubiquitous feature of cell signaling, site- 
specific interactions, must be incorporated to reflect the modularity of protein 
interactions. Many signaling proteins possess multiple functional motifs, do- 
mains and sites, which serve as modules for combinatoric protein organizations 
that can potentially generate diverse signaling patterns. A realistic protein au- 
tomaton should express dynamics at the level of protein sites. To this end, we 
arrive at the definition of "molecular finite automaton" (MFA), which models 
the discrete dynamics of a multidomain biomolecule. The relationship between 
MFA and EFA is as follows: (1) an MFA contains one or multiple EFAs and (2) 
each EFA in an MFA operates on a common variable structure that is shared by 
all EFAs. Table [2] summarizes a conceptual mapping between protein functions 
and the structure of an MFA. 

Definition 3 (molecular finite automaton). A molecular finite automaton 
is a tuple M = {Ei, E2, ■.., Enj'v), which is composed of n component EFAs 
and a shared variable structure v. The transition function for Ei is 6i : Si x 

X^\P^{^^) ^ 5,M,(v). 

Table [T] lists a set of operators and symbols that we will use to describe 
MFAs in state transition diagrams and state transition tables. In essence, the 
MFA structure encapsulates multiple finite automata and allows for a hierarchi- 
cal description of the component substructures of proteins. Internal variables 



Table 2: Molecular finite automaton and protein function 

MFA component Protein 

State Conformation 

State transition Conformation change 

Input Biochemical interaction 

Variable and predicate Molecular context 

Component machine Domain or site 



and predicates help to compress the state space and make an MFA more acces- 
sible to intuitive understanding. Without using internal variables and predicate 
functions, one can still build an MFA by expanding the state space assuming 
that variables store information of finite size. However, such an approach may 
result in a state expansion that might become intractable for a complex system. 

The construction of an MFA requires knowledge and/or a hypothesis about 
the biochemistry and the component substructure of the protein one wants to 
model. Although some proteins have established functions in well-studied sig- 
naling pathways, biochemical mechanisms for many protein functions still await 
characterization. For a protein with known structure and function, the corre- 
sponding MFA must be designed to faithfully reproduce the reactive dynamics 
of the protein. For a poorly characterized protein, building an MFA, as in 
building any model, provides an opportunity to generate testable hypotheses. 

As a design issue, MFA models can be constructed with a great deal of 
flexibility. Equivalent MFAs may differ in the number of states and the topology 
of state transition diagrams. A protein with multiple sites can be modeled by an 
MFA that has separate finite automata, each of which describes the dynamics of 
a domain. Equivalently, instead of using one automaton to model one protein 
site, the protein can be modeled by an MFA that consists of a single finite 
automaton that describes the combined behavior of all sites. For example, 
if a biomolecule has three independent domains that interact with different 
binding partners, it can be modeled as one eight-state finite automaton plus a 
variable structure (Fig. HJa)), where state si indicates that the molecule is in 
a free form with no binding partners and state sg indicates that all sites are 
occupied. Alternatively, the biomolecule can be modeled with three two-state 
(a free state and a bound state) EFAs with each EFA describing an individual 
binding domain (Fig.[21[b)). For the case of three identical and non-cooperative 
binding domains, it may be preferable to model the protein with a four-state 
finite automaton with the state space S = {si : free, S2 : singly-bound, S3 : 
doubly-bound, S4 : triply-bound} for a parsimonious structure in terms of the 
number of states (Fig. [He)). This four-state MFA can be further compressed 
to a two-state model as shown in Fig. HKd), where si denotes the unoccupied 
state and Si denotes the protein is occupied at least on one of its three sites. 
In this model, the information about how many sites are bound is resolved by 
a variable c serving as a counter. 

The state space of an MFA, S]\i, is a subset of the product of the state spaces 
of component EFAs, i.e., Sm C 5i x ^2 x ... x S'„, where the two sides achieve 
equality when all component EFAs are independent. The input set of an MFA 
is a union Xm = Ui=i -^i- ^'^^ ^^ input x € Xm, the transition function Si is 
chosen if x only belongs to A^. A transition function is chosen arbitrarily if x 
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Figure 2: A protein with three binding domains modeled by different MFA 
structures, (a) An MFA that uses a single eight-state EFA to model the overall 
state transitions. Unlabeled transitions are induced by the same inputs as those 
identified for parallel transitions, (b) An MFA that models the protein with 
three independent internal finite automata, each of which interacts with distinct 
binding partners implied by input symbols {a, 6, c}. Spontaneous inputs (ei, 62 
and £3) are distinguished for non-identical individual EFAs. (c) A four-state 
MFA that models the protein as having three independent and identical binding 
sites. Machine variables that register binding partners are not shown, (d) A 
two-state MFA that can replace the model in (c). The two states si and S2 
represent "free" and "bound" , and the variable c counts the number of bound 
sites. 



belongs to input sets of multiple component EFAs. For example, if x G XiOXj, 
either Si from Ei or Sj from Ej can be equivalently chosen to react to the input 
X. This scenario of relaying inputs corresponds to the case where a protein has 
multiple domains that interact with identical partners. 

To present a biological example. Fig. [3] shows the state-transition diagram 
of an MFA model of the high-affinity IgE receptor, FceRI, in the model of 
Goldstein et al. PUJ and Faeder et al. [5Tj. The state transition table for the 
MFA representation of FceRI is shown in Table |31 The receptor molecule FceRI 
has three functional domains: (1) an extracellular a subunit responsible for 
binding its ligand, IgE (in fact, an IgE dimer is considered in the models in 
Refs. [101 E]); (2) an intracellular /3 subunit that constitutively binds to Src- 
family protein tyrosine kinase Lyn when it is unphosphorylated and recruits 
Lyn with higher affinity upon phosphorylation; and (3) an intracellular 7 sub- 
unit that recruits another protein tyrosine kinase Syk upon phosphorylation. 
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Figure 3: State transition diagram for the MFA of a receptor FceRI with three 
component EFAs. a subunit: unbound (si), bound (52); /3 subunit: unbound 
and unphosphorylated (si), bound and unphosphorylated (S2), unbound and 
phosphorylated (S3), and bound and phosphorylated (34); 7 subunit; unbound 
and unphosphorylated (si), unbound and phosphorylated {S2), and bound and 
phosphorylated (S3). Internal variables Va, vp and v^ record binding partners of 
the a, l3 and 7 subunits, respectively. Inputs and operations (if any) are labeled 
together on the transition edges, separated by a delimiter symbol / (cf. Table[T]). 
For example, Ei/wq := (f) indicates that the MFA receives a non-specific input 
£1 and then sets the variable Uq. to the null value 0. 



The machine for FceRI has three variables, v — (wq,?;^, w^), which record the 
labels (id's) of binding partners for each of the corresponding domains. We 
note that recording binding partners using internal variables is equivalent to 
constructing an adjacency list to store an undirected graph. Tracking protein 
connectivity by such means allows protein complexes to be represented implic- 
itly. The connectivity of proteins within a complex can be retrieved by a graph 
traversal. In the FceRI pathway, some protein state transitions only happen in 
specific molecular contexts. For example, crosslinking of two receptors by an 
IgE dimer initiates signaling. On the cytoplasmic side of a crosslinked receptor 
dimer, a /3 subunit-associated Lyn can transphosphorylate the /3 subunit of the 
other receptor to initiate an intracellular signaling cascade [H] . To incorporate 
such non-local contextual information into the MFA-based pathway model of a 
signaling system, one needs to specify reaction rules. In the following section, 
we introduce a formal definition of reaction rules, which are used to describe 
interactions between proteins modeled by MFAs. 



2.2 Molecular interactions — reaction rules 

An MFA is essentially a discrete state model that characterizes a protein as 
a reactive agent with state transition protocols. Since specification of an MFA 
structure does not require consideration of the modeled protein within the larger 
context of a signaling system, explicit rules are needed to connect individual 
types of MFAs as parts of an interacting system. 

A protein-protein interaction system is composed of a collection of MFAs for 
different types of proteins in the system. To describe the interactions between 
these MFAs in terms of biochemical reactions, one can specify protein interac- 
tions for the MFAs by means of reaction rules. An interaction between proteins 
changes the states of all participating molecules. In other words, a reaction 



Table 3: State transition table for the MFA of FceRI. 
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a null symbol indicating a free site. An id is a label assigned to identify an individual 
MFA agent among a population of agents of one type. 



synchronizes state transitions and machine reconfigurations among participant 
MFAs. We can view a reaction rule for an MFA-based interaction as a speci- 
fication of synchronized state transitions and operations on internal variables. 
We formally define a reaction rule as follows. 

Definition 4 (reaction rule). A reaction rule is an injective function R : 
X ->• M\P. The sets X = {xi,X2, ■■■,Xn} and M = {Mi, M2, ..., M„} are 
ordered and contain inputs and MFAs, respectively. P is a predicate for the 
mapping. 

The mapping X — >■ M simultaneously sends each individual input Xi to 
machine Mi and executes the machine reconfigurations if Mi is responsive to 
Xi. The predicate P specifies an application condition for the mapping, which 
usually constitutes non-local molecular contexts (or patterns). Although the 
number of MFAs simultaneously involved in a reaction could in principle be any 
finite number n, we focus on two types of elementary interactions: (f ) unimolec- 
ular interactions that involve state transitions of one MFA and (2) bimolecular 
interactions that involve synchronized state transitions of two MFAs. Execution 
of a reaction rule changes the configurations of participant MFAs according to 
the protocols defined in the state transition tables of the MFAs. The above 
definition of a reaction rule requires that MFA agents be in states that can 
respond to the inputs. Together with the state transition tables for individual 
MFA types, reaction rules provide executable and programmable protocols to 
connect standalone MFAs into an interacting system. 

In the example model of Fig.^J all interactions in the model can be specified 
by four reaction rules (Table |4]). Phosphorylation and dephosphorylation of 
automaton A are approximated as unimolecular, single-step reactions, which 
can be defined by two rules, Ri : {ei} —!■ {A} and R2 : {£2} —^ {A}, respectively. 
In these cases, a rule is merely a local pairing of a current state and an input for 
an MFA. The state transition and its associated operations follow the protocol 
defined in the state transition table. For a bimolecular association reaction 
between automata A and B, a reaction rule can be formulated as R3: {a, a} — >■ 
{A,B}, where the MFA set and the input set are both ordered and have a 
one-to-one mapping. We note that the definition of a reaction rule does not 
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Figure 4: Interactions between MFAs. An example system that is modeled by 
two interacting MFAs, A and B. Automaton A models a protein that can, upon 
phosphorylation, bind to another protein modeled by Automaton B. Automa- 
ton A has three states: free and unphosphorylated (si), free and phosphorylated 
(52)) and bound and phosphorylated (53). Automaton B has two states: free 
(si) and bound (52)- Internal variable w's in A and B are used to record infor- 
mation about binding partners (i.e., the label of an MFA agent, or (p if free). 



specify machine states and therefore only MFAs in proper states will respond to 
an input. A rate law specifies a mathematical formula to calculate the kinetic 
rate for a reaction rule, which can be used for quantitative simulations. We 
note that only machines in states designated by a reaction rule are accounted 
for when one calculates the rate according to the rate law. For example, R3 
in Table |4] specifies a mass action rate law for the association reaction between 
protein A and B, r3{t) = k3A{-)B{-), where A(-) and B{-) represent the eligible 
populations of protein A and B. Eligible machine states in an MFA can be 
automatically resolved by searching the state transition table for responsive 
states with regard to the input symbol specified in the rule. In this case, since 
the eligible machine states for this reaction rule are S2 and si for A and B, 
respectively, the actual rate should be calculated as r^ — k3A{s2)B{si). The 
dissociation rule, R4 : {£3, e} -^ {A, B}\A-B, has a predicate A-B that requires 
A and B must share a bond. The operator '-' denotes a bond association 
between the two machines. 

In summary, a list of reaction rules assumes three roles: (1) Assigning rate 
laws for quantitative computation; (2) synchronizing state transitions for bi- 
molecular reactions; and (3) making a modeling choice to decide which subsets 
of machine transitions are to be included in a system, in which the specification 
of a set of reaction rules reflects the choice of modeling assumptions and scope. 
For example, some state transitions may never be triggered by a given set of 
reaction rules even though these transitions may be possible at the machine 
level. 



3 Quantitative modeling 

Reaction rules are essentially specifications of coupled chemical processes that 
can be taken to change the configuration of a system in time. A set of rules 
can be translated into quantitative models if the rules can be associated with 
rates via rate laws. A straightforward way to translate a rule-based model 
into a quantitative model is to automatically generate a conventional chemical 
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Table 4: Formal reaction rules for the model of Fig. 3] 

Rule description Formal specification Rate law 



R2 

R3 

i?4 



Phosphorylation of A {ei} ^ {A} ri(t) = fciA(-) 

Dephosphorylation of A {£2} —^ {A} T2{t) = k2A{-) 

A and _B association {a, a} — >■ {A, _B} r3{t) = k-iA{-)B{-) 

A and B dissociation {e3,e} — > {A, B}\A-B r4(t) = k4,A{-), or fc4_B(-) 



Rules are shown as one-to-one mappings between ordered sets (e.g., R4 : {ea, e} — >■ {A, B} 
indicates that £3 is an input for A and e is an input for B.) A-B denotes that machines A 
and B have a bond association. 



reaction network by evaluating reaction rules using a rewriting approach [201 116) . 
However, a far more efficient approach is to use reaction rules to directly perform 
a simulation. Below, we describe how to construct and simulate models specified 
in terms of MFA structures, for either deterministic or stochastic simulation. 

3.1 Deterministic simulation 

A biochemical reaction system is conventionally modeled using coupled ordi- 
nary differential equations (ODEs) that describe the temporal evolution of all 
chemical species in the system. Here, we demonstrate that one can use a set of 
ODEs to instead describe the population dynamics of MFA states. In fact, an 
MFA state (or a combination of states) corresponds to an ensemble of chemical 
species, which often manifests as an experimental observable, such as free pro- 
tein concentration or protein phosphorylation level. This idea is related to the 
concept of model reduction [TH EH |43l EH |45] , in which a reduced system of 
ODEs is formulated to describe the dynamics of a set of observable quantities 
instead of concentrations of an exhaustive set of chemical species. 

In a most general form, we can write the following ODE to model the pop- 
ulation level of the agents of MFA M in state s, denoted as M{s,t): 

'-^^rUt)-r.At). (1) 

where Hn (j'out) is the rate of population influx (outflux), consistent with the 
rate laws associated with the transition rules related to state s. For a single MFA 
agent, the above equation describes the time rate of change of the probability 
to find the machine in state s. 

Considering the simple example model illustrated in Fig. 21 we assume the 
law of mass action for protein association reactions and single-step protein phos- 
phorylation and dephosphorylation reactions. The following differential equa- 
tion can be used to describe the concentration of protein A in the machine state 

S2, ^(52): 



dt 



ri{t) + ri{t) ~ {r2it) + r^it)) , (2) 



where X{si, t) is the concentration of protein X in its machine state Si at time 
t. The parameter ki is the rate constant for an elementary reaction process 
defined by Rule i, and r^ is the overall reaction rate for Rule i (Tabled]). One 
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can systematically write down ODEs for other MFA states for the model of 
Fig. m We assume the total numbers of automata A and B are conserved, 
i.e., Atot = A{si,t) + A{s2,t) + A{s3,t) and Btot = B{si,t) + B{s2,t). Note 
that A{s3,t) — B{s2,t), the number of bonds formed between automata A and 
B. Based on these constraints, only one more independent ODE is needed to 
describe the whole system: 

^^^^ - k:iA{s2,t)B{s^,t) - kU{s^,t) 

= r,{t)~r,{t). (3) 

The above procedure of constructing a system of ODEs can be automated. 
In some systems modeled by MFAs, the construction of ODEs may not be 
straightforward. For example, if an MFA state transition depends on the status 
of a predicate evaluation, the calculation of the transition rate needs to be 
adjusted to account for the outcome of predicate evaluation. It is in many cases 
a difficult task to accurately account for rates of conditional transitions. We 
will discuss this issue below when we consider an example model for a MAPK 
cascade with a scaffold protein. 

3.2 Stochastic simulation 

The temporal dynamics of a biochemical reaction system can be modeled as 
a continuous-time discrete-state Markov process to account for the evolution 
of the system configuration, which can be described by the following master 
equation [IB] : 

^^ = J2 [w{c\c')pic',t) -p{c,t)w{c'\c)] , (4) 

where p(c, t) is the probability that the system is found in configuration c, and 
w{c'\c) gives the transition rate from configuration c to c'. In a chemical reaction 
system, a configuration is defined by the concentrations of all chemical species. 
In a system specified by MFAs, a configuration c is determined by the states 
and connectivities of MFAs. More precisely, the configuration is given by the 
properties of the individual MFA agents in the system, including the states and 
internal variables of these MFA agents. Analytical solutions to the above master 
equation are only possible for very simple systems. For a typical system, direct 
numerical integrations of the master equation is often intractable because of 
the enormous size of the configuration space. Kinetic Monte Carlo simulation 
is applied to conduct sequential random walks through the configuration space 
and to obtain stochastic trajectories for a system of interest. 

A system of rate processes can be simulated by the classic kinetic Monte 
Carlo method [37]. In our case, coupled processes in a biomolecular interaction 
system are defined by reaction rules that proceed in time at rates r. These rule 
rates are determined by the current configuration of the system. At each time 
step, the waiting time r for the next reaction event can be sampled from an 
exponential distribution with a mean waiting time of l/rtotj where rtot — X^i ''i 
is the overall reaction rate of the system. To select a process that generates the 
next reaction event, one can sample a rule i proportional to its rate r^ [15] . 
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For rule-based models defined by MFA structures, additional sampling steps 
are needed in each step to identify MFA agents that should undergo state transi- 
tions. Below, we outline a kinetic Monte Carlo algorithm for simulating systems 
specified in terms of MFAs. 

[Step 1] Initialization. Set time t = 0, set the initial states and copy numbers 
of individual MFA agents, specify the rate constants of rate laws associated with 
reaction rules; calculate rule rates r, and specify stopping criteria. 

[Step 2] At each time step, select a rule i and a waiting time r, and update 
the time f ■(— i -t- r. 

[Step 3] Sample MFA agents from MFA candidates that are in permissible 
configurations as specified in the reaction rule sampled in Step 2, execute the 
state transitions of the sampled MFA agents, and recalculate the rate vector r. 

[Step 4] Repeat Steps 2 and 3 until a stopping criterion is satisfied. 

In the above algorithm. Step 3 describes an agent-based simulation that 
tracks the states of individual proteins modeled by MFAs. A simulation pro- 
duces single-protein configurations with details about reactive sites as well as 
connections between proteins. 

Several general kinetic Monte Carlo methods for simulation of rule-based 
models have recently been developed [IHl EH [SDl [SU \52\ , which can be read- 
ily adapted to suit the MFA framework. A rule-based kinetic Monte Carlo 
simulation involves sampling molecular agents or agent components that are 
permissible for transformation according to a rule |48] . In simulating an agent- 
based MFA model, after a rule is sampled in Step 2, the algorithm described 
above involves searching for reactant agents in a population of candidate MFA 
agents that satisfy the rule protocol. The selected rule is executed by trans- 
forming the sampled MFA agents (as in Step 3 in the above procedure). MFA 
transformations are executed by sending input signals as specified in the rule to 
the sampled MFA agents. 

In some situations, the individual states of proteins and the connections of 
proteins within protein complexes may not be of interest, or such information 
may not be experimentally resolvable to verify the predictions generated by an 
agent-based simulation. In these cases, A kinetic Monte Carlo procedure that 
incorporates only observable quantities may be adopted. A kinetic Monte Carlo 
simulation can proceed as long as one is able to update the rates of reaction rules 
for each time step, which only requires tracking the populations of those machine 
states indicated in the rules (see Tabled]). These local MFA states usually consist 
of experimentally accessible quantities such as the number of protein bonds or 
phosphorylated sites. Other quantities, such as the concentration of a complex, 
may in some cases be synthesized from basic MFA configurations. 

4 Example: MAPK cascade 

In this section, we use the example of a scaffold-mediated MAPK cascade to 
demonstrate how to construct and simulate an MFA-based model of a signaling 
pathway. 

The system is inspired by the scaffold- mediated MAPK cascade in yeast. 
The scaffold protein Ste5 possesses three domains that bind the MAP kinases 
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Figure 5: MFA state transition diagrams for proteins in a MAPK cascade with 
a scaffold protein. The scaffold protein has three submolecular binding sites: 
the a, /3 and 7 sites that bind to M3K, M2K and MPK, respectively. State si 
(S2) for each scaffold site indicates that the site is free (bound). Each kinase has 
four states: si(free and unphosphorylated) , S2 (bound and unphosphorylated) , 
S3 (bound and phosphorylated), and S4 (free and phosphorylated) . Internal 
variables (««, vp, Vj in the scaffold, and w's in the kinases) record contextual 
information, in particular, record binding partners. Note that the names of 
inputs and internal variables are local to the MFAs in which they appear. In 
other words, inputs or internal variables in different MFAs with the same name 
are not identical. 



Stell (MAPKKK), Ste7 (MAPKK) and Fus3 (MAPK) in the signaling pathway 
for the mating response ^53j . We consider a scaffold protein with three inde- 
pendent binding sites: a, /3 and 7 sites, and three MAP kinases: a MAPKKK 
that binds to the a site of the scaffold protein, a MAPKK that binds to the 
/3 site of the scaffold protein and can be phosphorylated by MAPKKK, and a 
MAPK that binds to the 7 site of the scaffold protein and can be phosphory- 
lated by MAPKK. Wc assume that (1) binding reactions of the different kinases 
to the scaffold protein are independent processes, (2) MAPKKK can only be 
phosphorylated when it is bound to the scaffold protein, (3) phosphorylation of 
MAPKK (MAPK) can happen only when its kinase, phosphorylated MAPKKK 
(MAPKK), colocates on the same scaffold protein, and (4) phosphorylation and 
dephosphorylation of kinases can be modeled as single-step processes. 

To simplify notations, we will use SCF to denote the scaffold protein and 
M3K, M2K and MPK to denote MAPKKK, MAPKK and MAPK, respectively. 
Figure [5] illustrates state transition diagrams of MFA models for the four pro- 
teins involved in the system. A total of 12 reaction rules describing protein 
interactions in the system are listed in Table [5] 

In particular, we note that assumption (3) above requires checking molecular 
contexts to determine if protein state transitions are permissible, which exem- 
plifies one important feature of MFA framework that allows modeling context- 
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sensitive interactions. Consequently, Rules 9 and 11 in Table [5] require predicate 
evaluations that examine nonlocal machine configurations. For example, Rule 9 
validates an M2K agent by checking whether it is bound to an SCF agent that 
in turn is bound to an M3K agent in S3 . This contextual information is provided 
in the rule as a pattern of bond associations, M2K-SCF-M3K(s3). In general, 
a pattern of a multiprotein complex can be specified by a data structure that 
represents a connectivity graph. 

4.1 Deterministic ODEs 

We first show how to write deterministic ODEs to describe the evolution of the 
population levels of MFA states. Our MFA model of the MAPK cascade consists 
of 15 independent machine states, compared to a total 33 distinct chemical 
species implied by the same set of reaction rules. Furthermore, we assume that 
the total amount of each protein is conserved, which corresponds to Mtot = 
Sr=i -^(■Si) for an MFA with n states, where Mtot is the total number of the 
MFA agents. Kinetic parameters that appear in the following equations are 
defined in Table [51 The following equation characterizes M3K binding to the a 
site of the scaffold protein: 

_f!ifiZ ^ -fci[M3K(si) + M3K(s4)]SCF.a(si) + fc2SCF.a(s2) • (5) 

The equations for M2K and MPK binding to the /3 and 7 sites of the scaffold are 
similar. Together with two algebraic relationships, M3Ktot — J2i=i M3K(si) 
and SCF.a(s2) = M3K(s2) + M3K(s3), two additional ODEs are needed to 
completely account for the populations of the four possible states of machine 
M3K: 



dM3K(si) 

di 
dM3K(s2) 

di 



fc2M3K(s2) + fc8M3K(s4) - /ciSCF.a(si)M3K(si) (6) 

fciSCF.a(si)M3K(si) + fc8M3K(s3) - {k^ + /c7)M3K(s2) (7) 



For the case of M2K or MPK, because phosphorylation of a kinase bound to the 
scaffold (transition from state S2 to S3) is a conditional process that requires 
colocalization of an upstream kinase on the same scaffold protein, only a fraction 
of all kinases in state S2 are candidates for transition to S3. One can use the MFA 
state transition diagrams of Fig.[5]to write the ODEs that track the populations 
of states si and S2 for M2K as follows: 

dM2K(si) ^ fc4M2K(s2) + /sioM2K(s4) - fc3SCF.^(si)M2K(si) (8) 

dt 

dM2K(s2) 

Jt 



= fc3SCF./3(si)M2K(si) + fcioM2K(s3) - (^4 + fc9e'i)M2K(s2|9) 



Similar ODEs can be written to describe the dynamics of MPK states. The 
handling of Rule 9 (11) in Table [5] for M2K (MPK) phosphorylation requires 
special attention. Not all M2Ks (or MPKs) in state S2 are subject to transition 
to S3 at a given time. The eligible fraction must satisfy the model assumption 
that requires colocalization of an M2K (MPK) with its enzyme, a phosphory- 
lated M3K (M2K), on the same scaffold protein. The factors 9i and 62 are 
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Table 5: Reaction rules for the MAPK cascade with a scaffold protein 



Rule description 


Formal specification 


Rate law 


1. M3K recruitment 


{a, a}-i.{SCF.o,M3K} 


fciM3K(-)SCF.a(-) 


2. M3K dissociation 


{ei, ei}->{SCF.a,M3K}\SCF.a-M3K 


fc2SCF.a{-) 


3. M2K recruitment 


{fe,a}^.{SCF./3,M2K} 


fc3M2K(-)SCF./3(-) 


4. M2K dissociation 


{£2, ei}-i-{SCF./3,M2K}\SCF./3-M2K 


kiSCF.pi-) 


5. MPK recruitment 


{c,a}^{SCF.7,MPK} 


A;5MPK(-)SCF.7(-) 


6. MPK dissociation 


{£3, £i}^{SCF.7,MPK}\SCF.7-MPK 


fc6SCF.7(-) 


7. M3K phosphorylation 


{e2}^{M3K} 


fc7M3K(-) 


8. M3K dephosphorylation 


{£3}^{M3K} 


fc8M3K(-) 


9. M2K phosphorylation 


{fe}-)-{M2K}\M2K-SCF-M3K(s3 ) 


fc9M2K(-) 


10. M2K dephosphorylation 


{e2}^{M2K} 


fcioM2K{-) 


11. MPK phosphorylation 


{fe}-)-{MPK}\MPK-SCF-M2K(s3) 


A;iiMPK(-) 


12. MPK dephosphorylation 


{£2}->{MPK} 


A;i2MPK(-) 



introduced to account for the fractions of M2K(s2) and MPK(s2) that are eli- 
gible for transition to state S3. In general, it is non-trivial to obtain analytical 
equations for factors such as 9i and 02- In this example, we simply approximate 
these factors as follows: 0i w M3K(s3)/SCFtot and 02 « M2K(s3)/SCFtot- 
These approximations are exact only if kinase phosphorylation reactions are 
independent and context insensitive. Figure IHl shows results from determinis- 
tic ODE simulations, compared to those from kinetic Monte Carlo simulations 
(performed as described below). We note that the stochastic simulation results 
are exact. The time trajectories for phosphorylated M3K from the deterministic 
and the stochastic simulations agree with each other on average. However, the 
ODE-based simulations of M2K and MPK phosphorylation deviate from those 
of the exact stochastic simulations due to the approximations used for 0i and 02 
(Fig. [6Kb)). Writing ODEs to directly describe MFA states as variables provides 
a way of quickly constructing a quantitative (sometimes approximate) model for 
simulation. One can avoid using approximations by expanding reaction rules 
into a chemical reaction network [iniHO]- In such cases, the variables in these 
ODEs would correspond to concentrations of chemical species rather than MFA 
states. Generating these ODEs may be impractical, because the number of 
ODEs needed to capture the dynamics of the chemical species implied by a set 
of MFAs can be much larger than the number of ODEs needed to capture the 
dynamics of MFA states. 

4.2 Kinetic Monte Carlo simulation 

The stochastic simulations were carried out using the kinetic Monte Carlo al- 
gorithm described in the previous section. The system-specific implementation 
was an agent-based simulation procedure that samples the reaction rule list in 
Tableland transforms individual MFA agents. We note that M2Ke(s2) and 
MPKg{s2) in Rules 9 and 11 (Table [5]), in contrast to the approximations used 
in the ODE simulations, are updated by on-the-fly bookkeeping that tracks re- 
action events immediately coupled to the two rules. The bookkeeping accounts 
for the numbers of M2K and MPK kinases in state S2 that are eligible for 
the transitions specified by Rules 9 and 11. This implementation corresponds 
to a rejection-free algorithm for kinetic Monte Carlo simulation of rule-based 
models 021 [51], in which the exact rates of rules are calculated. This scheme 
becomes difficult to implement and computationally inefficient when predicates 
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Figure 6: Simulation of the model for the MAP kinase cascade with a scaffold 
of Fig. \5\ Results from ODE simulations (smooth curves) vs. those from kinetic 
Monte Carlo simulations (fluctuating curves), (a) Kinases bound to the scaffold 
M3K(s2 + S3) (solid line), M2K(s2 + S3) (dashed line) and MPK(s2 + S3) (dash- 
dot) each normalized by total number of kinases, (b) Phosphorylated kinases 
M3K(s3 + S4) (solid line), M2K(s3 + S4) (dashed line) and MPK(s3 + 54) (dash- 
dot) each normalized by total number of kinases. SCFtot — 1000, MSKtot — 
2000, M2Ktot = 1500 and MPKtot = 1000. Initially, aU machines are in state si. 
Kinetic parameters (fci to ki2, defined in Table [SJ are chosen for the purpose of 
illustration, not to model a specific system, ki ~ k^ = k^ — 1.66 x 10^^ nL-s~^, 
k2 ^ ki ^ kg — 1.0 s~^, ki = kg — kn — 3.0 s^^ and fcg = fcio = fci2 = 1-0 s~^. 



can be potentially affected by many types of reaction events. For example. Rule 
9 may be affected by events from eight other distinct rules, as indicated in the 
dependence matrix V below. When the algorithm executes an event defined 
by any of these eight rules, the algorithm also needs to evaluate whether the 
predicate of Rule 9 is affected. 

Reaction rules are usually coupled in most systems. In other words, an 
event from one rule may affect the rates of others. Such coupling relation- 
ships between rules can be summarized by a "dependency graph," or "influence 
map" [13,. For the example of the MAPK cascade model, we can summarize 
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the rule dependence in the form of the fohowing adjacency matrix: 



V 



llOOOOxOxOOO 
llOOOOxOxOOO 
00110000x0x0 
00110000x0x0 
0000110000x0 
0000110000x0 



11 







1 








0000000011x0 
00000000x1x0 
000000000011 



0000000000 



1 



(10) 



A matrix entry d.y at row i and column j represents the influence of an event 
from rule i on the rate of rule j . An entry with a boolean value 1 (0) indicates 
that an event in a rule has (does not have) an immediate impact on the rate of 
another rule, whereas x indicates that an event may or may not alter the rate of 
another rule, depending on whether the event changes the predicate of the other 
rule. For the example of the MAPK cascade, an event from Rule 1 can alter 
the rate of Rule 9 only in the case that a recruited M3K is phosphorylated and 
an unphosphorylated M2K is bound to the same scaffold agent. A dependency 
graph can in general be automatically derived by systematically analyzing a 
reaction rule set. In practice, the adjacency matrix "D can be made more quan- 
titative by replacing the Boolean value I's with pre-calculated values of rate 
changes. For example, the entry dgg (the influence of an event from Rule 8 on 
the rate of Rule 9) can be (conditionally) replaced by a numerical value, the 
value of — fcg (the minus sign indicates a reduction in the rule rate), because an 
M3K dephosphorylation may decrement the eligible population of M2K agents. 

To reduce bookkeeping, one may use a rejection algorithm [48' as an alterna- 
tive. This algorithm allows one to use rejection sampling to simplify the firing 
of reactions when state transitions are associated with predicate functions. In 
this algorithm, one calculates rule rates and samples participant MFA agents 
without considering constraints imposed by the predicate functions. Sampled 
trial agents are rejected for state transition if the predicate function does not 
evaluate as true. For example, the number of all MPK agents in state S2 can be 
used to calculate the rate of Rule 11 as Fn = fciiMPK(s2). Once an event from 
Rule 11 is sampled, a trial MPK agent in state S2 will be chosen, which will then 
undergo a state transition only if the transition condition is satisfied. Other- 
wise, the transition is rejected. Implementation of a rejection algorithm is easier 
compared to that of a rejection-free algorithm. The rejection algorithm enforces 
the conditions of state transitions only when a rule that has conditional tran- 
sitions is sampled. Our experience suggests that a rejection algorithm is more 
computationally efficient than a rejection- free algorithm as long as rejections do 
not comprise the vast majority of all Monte Carlo steps [35] • 

The sparsity of dependency matrix T) in Eq. (llOp (with many entries being 
zeros) indicates that rule couplings are largely localized. Similar to an efficient 
implementation of a conventional stochastic simulation algorithm for chemical 
reaction systems [53], for a large system specified by a considerable number of 
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rules, weak dependency between rules can be used to optimize the procedure 
for updating rule rates after each Monte Carlo step. 

5 Discussion 

Information processing in living cells can be viewed as protein computation, 
in which proteins act as computing machines that react to external signals 
by local computation |31[ 155) . Thus, a protein-protein interaction system is an 
integrative and distributed system with numerous computing devices interacting 
with each other under certain protocols. This perspective suggests that formal 
computing models can help archive, organize and interpret protein functions and 
their interactions. Formal structures also facilitate the process of computational 
modeling of complex and large-scale protein-protein interaction systems. 

In this work, we introduce an extension to the traditional computing model 
of finite state automata to describe protein behaviors in response to external in- 
puts and protein interactions. The MFA formalism offers a rule-based platform 
for modeling and simulating biochemical systems, especially for signal transduc- 
tion. An MFA is in essence a data structure that specifies protocols to define 
the activity of a protein in a discrete state space. An MFA can be used as a 
representation of knowledge or hypotheses about a protein and can serve as a 
building block for biomolecular interaction models. At the systems level, reac- 
tion rules connect separate MFA-represented proteins and model the dynamics 
of a protein interaction system as synchronized MFA state transitions. The 
MFA formalism allows for a clearer and more natural representation of proteins 
and the combinatorics of protein interactions. The MFA formalism adds in in- 
tuitive formulation of mechanistic models of signal transduction, which can be 
accessible for those who are familiar with the biological knowledge underlying 
the models. For quantitative model computation, as our example model of a 
MAP kinase cascade system demonstrates, ODEs for deterministic simulations 
can be constructed to track concentrations of machine states that often directly 
correspond to experimentally resolvable quantities. (For some states, the ODE 
solutions give approximations.) For exact and stochastic simulations, rule-based 
kinetic Monte Carlo methods can be applied. 

Formalisms derived from finite automata theory have been proposed earlier 
for applications in biology. Notably, Harel and coworkers [SS] have applied stat- 
echarts, a graphical and hierarchical (extended) finite automata structure [50], 
to model and simulate cell development and dynamics of cell populations. Re- 
cently, use of finite automata to model biomolecular interactions was studied by 
Cardelli [57] , in which the author introduced the concept of polyautomata. The 
concepts of polyautomata and MFA are closely related. In Ref. [57], polyau- 
tomata are used to represent SPiM scripts, which specify stochastic simula- 
tions via the stochastic pi-calculus approach implemented in the SPiM software 
tool [58] . Here, we have shown that MFAs can be used to specify both determin- 
istic and stochastic simulation approaches (Fig. 6). Cardelli ^7\ demonstrated 
that polyautomata are useful for modeling protein complex formation, including 
polymerization-like reactions. Here, we have shown that MFAs can be used to 
model protein complex formation as well as post-translational modifications of 
proteins, as illustrated in Figs. 3 and 5. Finally, the MFA formalism extends the 
polyautomata concept in an important way by allowing for the explicit represen- 



19 



tation of the functional components of proteins and the structural relationships 
among these components (Fig. 2). 

A potential strength of the MFA framework is the mature development 
and many applications of finite automata. Various forms of finite state au- 
tomata have been industry standards for modeling reactive systems for many 
years. In particular, MFAs are amenable to hardware implementations using 
programmable logic devices including widely-used programmable array logic 
(PAL), generic array logic (GAL) and field programmable gate array (FPGA) 
devices. In particular, because of the sequential nature of discrete event-driven 
simulation, the performance of simulating large-scale complex biochemical re- 
action systems by stochastic simulation is poor even if it is not prohibitive. 
Hardware implementation may yield an advantage in speed. The first hardware 
(FPGA) stochastic simulations of biochemical networks were implemented by 
Keane and co-workers fS^ and Salwinski and Eisenberg [5D]. Taking advantage 
of the parallel architecture of FPGA, implementations by Salwinski and Eisen- 
berg |60) allowed improvement in the speed of simulation of a simple bimolecular 
reaction by at least one order of magnitude compared to a conventional software 
implementation on a benchmarking platform. Recently, Yoshimi et al. |61| im- 
plemented the next reaction method of Gibson and Bruck [S3] in an FPGA-based 
simulator and were able to achieve a significant speed-up. Implementation of 
rule-based models into programmable circuits has yet to be realized, but finite 
state machines are routinely implemented in hardware. Hardware implemen- 
tation stores state variables and embeds state transition protocols into digital 
electronic circuits. In the design process, the MFA structures need to be trans- 
lated into binary logic to apply digital computing to achieve the defined machine 
dynamics. As proteins can be modeled as MFAs, in principle, the dynamics of 
a protein interaction network may be simulated by electronic circuits. 

Another potential use of the MFA structure is to archive proteins in terms of 
their reaction dynamics. Since an MFA is a standalone structure for storing the 
discrete dynamics of a protein, we expect that it can be used to systematically 
archive protein functions, with the MFA structure serving as an elementary 
record type for a database. Protein records in current protein databases are 
mostly annotations including amino acid sequences, functional domains, asso- 
ciated functions, etc. However, such information cannot be readily used to 
construct mechanistic biomolecular interaction models. The MFA structure of- 
fers an alternative way of storing protein records. Using a database with MFA 
records, one can construct a model by querying the database for MFA structures 
to obtain a set of desired molecular building blocks. One can then specify reac- 
tion rules to connect these machines. Such a rule-based model can be efficiently 
revised and incrementally refined when records of MFAs involved in the model 
are updated to reflect new knowledge. 
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